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Abstract 

We present an efficient finite difference method for the computation of parameter sensitivities that 
is applicable to a wide class of continuous time Markov chain models. The estimator for the method is 
constructed by coupling the perturbed and nominal processes in a natural manner, and the analysis pro- 
ceeds by utilizing a martingale representation for the coupled processes. The variance of the resulting 
estimator is shown to be an order of magnitude lower due to the coupling. We conclude that the proposed 
method produces an estimator with a lower variance than other methods, including the use of Common 
Random Numbers, in most situations. Often the variance reduction is substantial. The method is no 
harder to implement than any standard continuous time Markov chain algorithm, such as "Gillespie's 
algorithm." The motivating class of models, and the source of our examples, are the stochastic chem- 
ical kinetic models commonly used in the biosciences, though other natural application areas include 
population processes and queuing networks. 

Keywords: Finite differences, variance reduction, parameter sensitivities, next reaction method, Gillespie, 
random time cliange, continuous time Markov chain, martingale. 



1 Introduction 

We develop a new finite difference method for the computation of parameter sensitivities that is applicable to 
a wide class of continuous time Markov chain models. For k G {1, . . . , M}, let C,k G denote the possible 
transition directions for a continuous time Markov chain, and let : M*^ — )• M denote the respective 
intensity, or propensity functions}^ The random time change representation of Kurtz for the model is 

X{t) = X(0) + Y.Yk (^1^ Xk{X{s))ds^ Ck, (1) 

where the are independent, unit-rate Poisson processes. See, for example, l fT3]| . Q Chapter 6 ], or the 
recent survey 15 ]. The infinitesimal generator for the model ([T]l is the operator A satisfying 

iAf)ix) = ^ Xkix){f{x + a) - fix)), 
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where / : M'' — )• M is chosen from a sufficiently large class of functions. Without loss of generality, we 
assume throughout that the state space of the process, S, is a subset of Z*^. 

Consider a family of models ([T]) indexed by a set of parameters, which we denote by the vector 9. 
Even when there are good theoretical reasons for believing the model is a reasonable description of some 
phenomenon, usually the parameters are not known precisely and have to be estimated experimentally. 
Depending on the setup and the parameters in question, it may be difficult to obtain good estimates. Thus, it 
is important to analyze how sensitive features of interest in the model are to variation in the parameters. For 
ease of exposition we take ^ to be a scalar, though note that it is trivial to extend all of the ideas of the paper 
to the setting of 6* G M^, for some £ > 0. 

We let / : 5 — )• M be a function of the state of the system that gives a measurement of interest. For ex- 
ample, / could be the abundance of one of the components at a particular time. Define J{9) = ¥,f{X^{t)), 
where the 6 dependence is being made explicit. The problem of interest is to efficiently approximate J' (9). 

There are a number of methods that can be used for the computation of such parameter sensitivities in 
this setting, including finite differences, likelihood ratios and Girsanov transformations, and infinitesimal 
perturbation analysis, each with its own benefits and drawbacks; see for example pSKTTKTTllTSl. We focus 
on finite difference methods, which due to its simplicity, is the most popular choice. Specifically, in this 
paper a new finite difference method is introduced that is easy to implement, analytically tractable, and 
typically produces an estimate with a given tolerance with substantially lower computational complexity 
than that obtained using the other methods currently known to the author. 

While continuous time Markov chain models of the general form ([T]l are used ubiquitously in both 
industry and the sciences to model natural phenomenon ranging from population processes to queueing 
networks, we feel the method developed here will be most useful in the study of stochastic models of 
biochemical reaction networks. We will therefore choose the language of biochemistry throughout, and also 
choose this area as the setting for our examples. 

A biochemical reaction network is a chemical system involving multiple reactions and chemical species. 
If the abundances of the constituent molecules of a reaction network are sufficiently high then their concen- 
trations are typically modeled by a coupled set of ordinary differential equations. If, however, the abun- 
dances are low then the standard deterministic models do not provide a good representation of the behavior 
of the system and stochastic models are used. The simplest stochastic models of such networks |[T2llT6ll treat 
the system as a continuous time Markov chain with the state, X, being the number of molecules of each 
species and with reactions modeled as possible transitions of the chain. More explicitly, if the kth reaction 
happens at time t, then the system is updated by the reaction vector (k, 

x{t) = x{t-) + Ck. 

Letting : M'^ — ^ M denote the intensity, or propensity, of the kth reaction, we see this stochastic model 
satisfies ([T]). 

As will be pointed out in the following sections, the strategy being proposed here is in some ways similar 
to the common reaction path (CRP) method proposed in [18 1, which is also a quite capable estimator for 
finite differences. However, there are important differences. First, the actual coupling, and hence simulation, 
of the relevant processes is different. The coupling proposed here tends to provide an estimator with a lower 
variance, especially when the process is considered for moderate to large times. Second, the coupling 
proposed here lends itself to analysis more readily than that used in [18] as the centered counting processes 
used in our coupling are martingales with respect to the natural filtration of the process 0| . Third, the method 
being proposed here is as easy to implement as the usual Gillespie algorithm or next reaction method. The 
strategy employed in fTS'l, on the other hand, requires being quite careful with the seeds of the pseudo- 
random number generators used since one independent seed is required per reaction channel per sample 
path generated. Finally, the coupling proposed here essentially converts the problem of generating two 
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paths of a continuous time Markov chain into a problem of generating one path of a different continuous 
time Markov chain with an enlarged state space. Therefore, all analytical and computational techniques 
developed for the study of continuous time Markov chains, of which there are many, will be employable on 
this larger system, and therefore applicable to the problem of computing sensitivities. 

The most common finite difference coupling used today for the approximation of sensitivities is proba- 
bly an implementation of Gillespie's algorithm plus using common random numbers (CRN). We will also 
discuss this coupling and give the relevant stochastic representation of it. We will conclude that the coupling 
proposed here will produce a lower variance estimator than CRN for the same reasons that it produces a 
lower variance estimator than the CRP scheme of ifTSl . 

The main goals of this paper are to introduce the new method and to provide the mathematical analysis 
of the expected squared difference between the two relevant processes, though some relevant examples 
will also be provided. In Section [2j we formally introduce our mathematical model of interest, including 
all technical assumptions. In Section [3} we develop our new finite difference estimator and provide sharp 
analytical bounds. We also discuss the long time behavior of the introduced estimator and compare it with 
both CRP and CRN. We conclude that the proposed method will be quite superior for moderate and large 
time scales. In Section[4j we provide examples demonstrating our main results. 



2 The Formal Setup 

We consider the family of models 

X%t) = X\0) + Yyk (^l\i{X%s))ds^ Ck, (2) 

where the Yk are independent, unit-rate Poisson processes, the vector 9 represents a given choice of param- 
eters that we are making explicit in the notation, and all other notation is as before. The assumption that 
there are a finite number of possible jump directions can almost certainly be weakened. However, this 
assumption makes the analysis significantly cleaner and all the motivating models (such as those arising 
from biochemistry) naturally satisfy such a condition. We define : Z"' — )• M'^ by 

k 

We make the following running assumptions throughout the remainder of the paper. The first is that the 
intensity functions are uniformly (in 6) globally Lipschitz. The second is that the intensity functions scale 
at least linearly with perturbations to 9. 

Assumption 1. We suppose that there is a Ki > for which 

|A^,(x) - Xl{y)\ + \F\x) - F'{y)\ <K^\x-y\, 



for all k, 9 of interest, and all x,y G S. 

Assumption 2. We suppose there is a K2 > so that for all k and all e < 1, 



sup 



A^^(x)-A^,(x)| + |F'^+^(x)-F^(x 



< K2e. 



Assumption [T]can almost certainly be weakened to a local Lipschitz condition, in which case analytical 
methods similar to those found in ||9i| and/or [15] can be applied. Proving our main results in such generality. 
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while possible and certainly worth doing in future work, will be significantly messier and we feel the main 
points of the analysis will be lost. Note that Assumption [T] automatically holds if 5 is a bounded set. In 
the chemical setting, mild assumptions on the intensity functions ensure that the non-negative orthant is 
forward invariant. Therefore, S is bounded if there is a vector ^ € for which ^ • Cfc ^ for all k. For 
example, such a ^ exists if mass is conserved. Assumption [T] also holds if the intensity functions are simply 
set to zero outside of a compact subset of Z'^, which has the same effect as analyzing the standard model up 
until a stopping time r, defined to be the time the processes leaves a given compact set (essentially using a 
localization argument). 

Another relevant situation in which Assumption [T] holds is when the equivalent scaled models are ana- 
lyzed. While we point the reader to ||2j [3l IH [TOl for a thorough description of this model in the biosciences, 
we will briefly discuss it here. For some parameter of the system, N, we let denote the process with ith 
component X^^ = Xi/N°'\ where the > are chosen so that X^^ = 0(1). Under mild assumptions on 
the intensity functions Afc, it can be shown that X^ satisfies 

=X^(0) + (^*iV'^^+'''=-A,(X^(5))ds) Cf , (3) 

where C^. = Ck,i/N"\ and Pk is chosen so that Xk{X^{-)) = 0(1). This scaled model is 0(1) and 
therefore more readily satisfies Assumption [T] where now it is understood that the state space is 

= {zeM.'^\zi = XiiV-"%rc G S}. 

In many applications, it is more relevant to compute expectations and parameter sensitivities of the scaled 
model Q than the unsealed version ([T]l, though we do not revisit this point in the current paper. 



2.1 The basic problem and the benefits of variance reduction 

We let / : Z>q — M be a function of the state of the system which gives a measurement of interest and 
define 

J (9) =E/(X^(t)). 

The problem of interest is to efficiently estimate J' (9), where we recall that we are making the simplifying 
assumption that 9 is one-dimensional. 

To estimate J' (9) the centered finite difference is often used: 



as its bias is O(e^) O. That is, 

J'{9) = ^-"^^ + 0(e 

e 

This should be compared with the forward difference, which has a bias of 0(e 



F.f(X'+''^(t))-JLJ(X''-'Ht)) , ^, 2, 



jW = M^!!!W)_M^!W)+o(.). 



The estimator for Q using centered finite differences is 



R 



R 

i=l 
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with 

d[i\[e) = , (6) 

where X^j represents the ith path generated with parameter choice 6, and R is the number of paths generated. 

If X^^^^'^{t) and Xj^j "^^^(0 are computed independently, the variance of (i[j](e) is 0(e^^), and, hence, the 
variance of Dji{e) is 0(i?~^e"^). Note that 

E(Z)^(e) - J'(^))2 = Var(Dfi(e)) + {EDaie) - J'{e)f 
= 0{R-\-^)+0{e^), 

which, for a given R, is minimized when e = 0(i?^^/^), at a value that is 0(R^'^/^). Therefore, the optimal 
convergence rate to the exact value, in the sense of confidence intervals, is 0{R^^^^) IS. 

Many computations are performed with a target variance (which yields a target size of the confidence 
interval). Denoting the target variance by V* , we see that the number of paths required is then approximated 
by the solution to 

=:^Var(d(6)) = y* =^ i?=-Lvar(d(6)). 

Thus, decreasing the variance of d(e) lowers the computational complexity (total number of computations) 
required to solve the problem. The basic idea of coupling, in the context of this paper, is to lower the 
variance of d{e) by simulating X^+'^Z^ and X^^'^/'^ simultaneously so that the two processes are highly 
correlated or "coupled." That is, instead of generating paths independently, we want to generate a pair of 
paths (X^+'/2,X^"'/2) so that for appropriate choices of /, the variance of /(X^'+'Z^) - f{X^-^/^) is 
reduced. The basic idea of any such coupling is to reuse, or share, some portion of the driving "noise" in the 
generation of each process. As already alluded to in the Introduction, one such finite difference method that 
achieved a substantial reduction in variance due to coupling can be found in [18], which we discuss in more 
detail in later sections. 

we will develop a new coupling technique so that the variance of (e) in (|6]) is 0(e^^), 



In Section 
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a full order of magnitude lower (in e) than when the paths were generated independently. This will lead 
to a finite difference method with an optimal convergence rate, in the sense of the above paragraph, of 
0(-R~^/^), achieved when e = 0{R^^^^). More importantly, however, the variance of the estimator Q 
will be 0{R^^er^), which should be compared with a variance of 0{R^^e~^) when independent paths are 
used. Thus, the number of paths (and computational complexity) required to solve a given problem will be 
reduced by an order of e. 



3 Coupled finite differences 

In Section |3.1[ we discuss how to couple the requisite processes for the coupled finite difference method 
being proposed here. In Section [3]2j we provide sharp bounds on the variance of the estimator. 

3.1 Coupling the processes 

Whether using the forward or centered difference, the main problem is to intelligently produce two paths 
generated from systems whose parameters differ by an order of e. A good coupling should satisfy three 
things: 
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(i) it should minimize the variance of the difference ([6]), 



(ii) it should be easy to simulate, and 
(iii) it should be analytically tractable. 

We will show that the coupling Q below satisfies each of these requirements, however we begin by moti- 
vating the coupling by two simpler problems that capture the core idea. 

We consider the problem of trying to understand the difference between Zi{t) and Z2{t), where Zi, Z2 
are Poisson processes with rates 13.1 and 13, respectively. We let Yi and Y2 be independent unit-rate Poisson 
processes, and set 

Zi{t) = Yi{m) + Y2i0.it) 
Z2{t) = Yi{m), 

where we use the additivity property of Poisson processes. The important point to note is that both processes 
Zi and Z2 are using the process yi(13t) to generate simultaneous jumps. The process Zi then uses the 
auxiliary process 12(0. It) to jump the extra times that Z2 does not. The processes Zi, Z2 will jump together 
the vast majority of times, and in this way be very tightly coupled]^ The coupling above also already hints 
at the main points of the mathematical analysis that will be carried out in Section 3.2 as 

Zi{t)- Z2{t) = Y2iO.lt), 

and so, 

E\Zi{t) - Z2{t)\ = EY2iO.lt) = O.lt 
E(Zi(t) - Z2{t))'^ = EYiiO.lt)"^ = O.lt + O.Olt^ 

More generally, if Zi and Z2 are non-homogeneous Poisson processes with intensities f{t) and g{t), 
respectively, then we could let Y\,Y2, and Y-^ be independent, unit-rate Poisson processes and define 



= Yi [J^ f{s) A g{s)ds^ + Y2 (^^ 



' f{s)-{f{s)Ag{s))ds 



Z2{t) = Yi( / fis)Ag{s)ds\+Y3i / g{s) - {/(s) A gis)) ds 





where we are using that, for example, 

Yi f{s) A g{s)ds^ + Y2 (^1^ f{s) - {f{s) A g{s)) ds^ ^ Y (^J^ f{s)ds 

where y is a unit rate Poisson process and we define aAb = min{a, b}. Thus, we are coupling the processes 
by splitting up the intensity functions into two pieces, one shared 

f{s)Ag{s), 

and the other not, and then using the same noise, Yi , on the shared portion. 

We return to the problem of coupling the main processes of interest to us. For ease of notation, we 
will couple the processes X^"*"*^ and as opposed to and X^^^/"^, with the understanding that 



^In this case, the long-run percentage of jumps that are shared can be quantified precisely as 13/(13 + 0.1) ~ 0.99923. 
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generating the centered difference is performed in the obvious manner. We generate our coupled processes 
(X^+SX") via: 

k ^rj^ 

X\t) = X\0) + J2 Yk,i ^f^ A^^(X^+^(s)) A Xl{X\s))ds^ a 

+ E^M (^I^XiiX'is)) - Xl+^{X'^^{s)) A XiiX%s))ds^ Ck, 

where the j are unit-rate Poisson processes and all other notation is as before. Thus, and just as in the 
example pertaining to the non-homogeneous Poisson processes above, the effect of the intensity function 
A^^"" on the process X^^^ has been split into two pieces: one of size A^^''(X^+''(s)) A Xl{X^{s))ds, and 
one of size 

A^^(X^+^(.)) - A^^(X^+^(.)) A Xi{X<^{s))ds. 
Further, since the two processes X^^*^ and X^ share the contribution of each of the terms with intensity 

Xl+^{X'+^{s))AXiiX%s))ds 

we expect them to be highly correlated. It is important to note that the marginal processes have the same 
distributions as the respective processes generated via Q. This fact can be seen by noting that Q is a 
continuous time Markov chain and that the transition rates of the marginal processes are identical to those 
of ([2]) with the corresponding rate constants. Note also that the coupling Q is essentially the same as in the 
toy problems above where we coupled Zi and Z2. 

A coupling similar to Q first appeared in |T4'|. More recently, it was used in Q to study the strong 
error of different approximation methods in the discrete stochastic case, and in f3] to generate paths so as 
to apply multi-level Monte Carlo techniques in the continuous time Markov chain setting. The application 
of the coupling (|7]l towards the problem of parametric sensitivity analysis is the main contribution of this 
paper. 

As discussed at the end of Section [l] the process X^) satisfying ([T]) is a continuous time Markov 

chain with state space Z'^ x 'Z'^. Therefore, all analytical and computational techniques developed for the 
study of continuous time Markov chains will be applicable to this system, and, hence, to the problem of 
computing sensitivities. 

Before proceeding with the analysis, we give the algorithm for generating a path via (|7]l. 

We note that the method below is the next reaction method applied to (|7]) [1, 8|. See [IJ for a thorough 
explanation of how the next reaction method is equivalent to simulating representations of the forms consid- 
ered here. Below, we will denote a uniform[0, 1] random variable by rand(0, 1), and we remind the reader 
that if [7 ~ rand(0, 1), then ln(l/[/) is an exponential random variable with a parameter of one. All random 
variables generated are assumed to be independent of each other and all previous random variables. It is 
assumed that the processes start with the same initial condition, though this can be weakened in the obvious 
manner. Finally, we note that it is also possible to simulate the continuous time Markov chain Q by the 
obvious adaption of Gillespie's direct, or optimized direct, algorithm. While we do not formally provide 
that algorithm here, it will be problem specific as to which implementation (Gillespie versus next reaction 
method) is more efficient. 
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Algorithm 1 (Simulation of the representation (|7]l). Initialize. Set X^"*"*^ = = x and t = 0. For each k 
and i G {1, 2, 3}, set 

• Pk,i = ln(l/ufe,i), where Uk,i is rand(0, 1). 

• Tk,i = 0. 

Repeat the following steps: 
(i) For each k, set 

(ii) For each k and i G {1, 2, 3}, set 



At 



k,i 



{Pk,i - Tk,i)/Ak,i, if Ak,i > 
oo, ifAfc_i = 



(iii) Set A = minfc j{ At^. j}, and let /i = {fc, i} be the indices where the minimum is achieved. 

(iv) Set t = t + A. 

(v) Update state vectors according to reaction (^^ (where minimum occurred in step {in)): 

( (x^+%x^) + (a,Cfc), if^ = i 

{X'+%X') = \ {X'+^,X') + {Ck,0), Hi = 2 . 

( (X^+%X^) + (0,a), ifi = 3 

(vi) For each k and i G {1, 2, 3}, set T^ i = T^^i + A^^i x A. 
{vii) Set = + ln(l/M), where u is rand(0, 1). 
{via) Return to step (i) or quit. 

Note that at most two of A^^i, Ak^2, A^^z will be non-zero at each step. Further, it will often be that 
Afc 1 » maxj^fc 2 7 ^fc.s} and the processes will move together the vast majority of the time (which is, of 
course, the whole point of such a coupling), showing that the cost of generating the path X^) will 

be less than the cost of generating two paths via the representation ([2]). This fact is observed in the data 
collected on the numerical examples in Section|4] 

The Common Reaction Path method 

We now revisit the point that the strategy being proposed here is similar to the one proposed in lITSll . where 
instead of the coupling Q the authors used what is equivalent to 



x'{t) = x\Q) + Y,yk{^f^)i{x 



s))ds Ck 



{s))ds Ck 



(8) 
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where the are independent unit-rate Poisson processes and all other notation is as before. The key point 
is that they are using the same Poisson processes for the generation of each path. As stated in Section [T] the 
estimator built with paths generated via ^ is quite capable in many circumstances. The main differences 
between their method and the one being proposed in this paper via (|7]l are: 

1. The processes generated via (|7]) are generally coupled tighter than those generated via ([8]), resulting 
in a lower variance for the estimator, sometimes substantially so. However, sometimes the coupling 
([8]l produces a lower variance estimator than (|7]l when the terminal time T is small. These facts will 
be demonstrated via example in Section]?] and discussed more below. 

2. The model (|7]l is more amenable to analysis as the centered counting processes of Q are martingales 
with respect to the natural filtration m, which is not the case for (]8]). 

3. Implementation of Q is simpler than that of Q as Q does not require the generation of many 
independent seeds for the pseudo-random number generator. In fact, simulation of (]7]l is no more 
challenging than simulating any continuous time Markov chain. 

4. The coupling (|7]l makes the problem of computing the difference between two paths into one of 
computing a single path of a different continuous time Markov chain with an enlai^ged state space. 

The following example is chosen to highlight the advantages of the coupling Q over that of (J8]l. 

Example 1. Consider the simple model in which an mRNA molecule is created and degraded 

i M, (9) 

0.1 

which is equivalent to an M/M /oo queue with arrival rate 9 and service rate 0.1. Here we are using the 
common convention of putting the rate constant of a reaction next to the corresponding reaction vector. We 
suppose that we want to understand the sensitivity of the expected number of mRNA molecules with respect 
to the parameter 6^2. We consider how the different representations Q and Q "should" behave on this 
model, whose representation via (]2]) is 

X^{t) = X(0) + Yi {et) - Y2 0.lX^{s)ds^ . (10) 

For e > 0, let X^) satisfy (]7]), which for this example is 

X^+'{t) = X^+'{0) + Yi,i{et) + Yi,2{et) 

-Y2,i (^j\.lX'{s)ds^ -Y2,2 (^J^O.l{X'+'{s)-X'{s))ds^ 

X^{t) = X^O) + Yi^iiet) - Y2,i (^J^ 0.1X\s)ds^ , 

where we have used that with this coupling > X^{t), for all t > if e > 0. Therefore, assuming 

that X^+^{0) = X^{0), we have 

X'+'{t) - X'{t) = Yi,2{et) - Y2,2 0.l{X'+'{s) - X'{s))ds 
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Setting Z^'^ = X^'^'^ — X^, we see that Z^''^ itself can be viewed as the solution to ( 10), though with zero 



initial condition and input rate e. The mean and variance can be solved for as functions of time and satisfy 



KZ^'^it) = E{X'-^'{t) - X''{t)) = ^(1 - e-°-") (11) 
Var(Z^'^(t)) = yar{X^+'{t) - = ^(1 - e-°-"). (12) 

On the other hand, if X^) satisfy the coupling then 

X^+'{t) = X^+'{0) + Yi{9t + et) - Y2 (^J^ 0.lX'^+'{s)ds 
X^{t) = X^O) + Yi{dt) - Y2 (^J^ 0.1X^{s)ds 

and 

X'^+'{t) - X^{t) = Yi{et + et) - Yi{et) 



Y2 (^j\lX^+'{s)ds^ -Y2 (^j\lX\s)ds 



(13) 



In this case, we still have that ¥,[X^'^{t) — X (t)] satisfies the right hand side of ( [TT] ). However, the variance 



can not be calculated with such ease as for ( [T2] ). We note, however, for large t, we will have 6t + et ^ 6t, 
and therefore anticipate 



0.1X^+'{s)ds^ / 0.lX%s)ds, 
Jo 



implying that the two processes X and X should decouple, and behave independently. This is demon- 
strated in a numerical example in Section [4] where we show the variance of the difference ( [T3] ) converges to 
40, which is the same as if X^^'^ and X^vere generated independently, and substantially larger than the 



bound given in ( [12] ) for small e. 

As will be discussed immediately below, we also expect to see this "decoupling" when Gillespie's algo- 
rithm is implemented with common random numbers (CRN). This also will be demonstrated by example in 
Section m □ 

The above example gives a heuristic as to why the coupling ^ will tend to give a lower variance than 
([8]). For processes generated by ([T]), whenever X^'^'^{t) w X^{t) during the course of the simulation the 
processes have re-coupled, regardless of the the history of the process up to that time. On the other hand, if 
X^'^'^ and X^ are generated via ([8]), then X^^"^ X^ need not imply 

\l+^{X<^+^{s))ds^ f \l+\X\s)ds), 
Jo 

and so the two processes could be exploring completely different portions of the Poisson processes. Thus, 
even when X^+^{t) = X^{t) in the course of the simulation of ([8]), the integrated intensities will not be 
equal and so the processes are no longer coupled as tightly as they were at time zero. As time increases, this 
problem could get worse and the processes can decouple completely (as happens in the example above). 

Common Random Numbers and Gillespie's algorithm 

The standard method of using common random numbers in the implementation of Gillespie's algorithm will 
suffer the same defect as ([8]) in that for large times the coupled processes can decouple. To understand why. 
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we need to give the correct representation for Gillespie's algorithm (which is equivalent to simulating the 
embedded discrete time Markov chain). The following representation can be found in [51. We define 



^o{x) = ^Afc(j;), and qk{x) = ^ Ai(x)/Ao(x). 



i=l 



Let y be a unit rate Poisson process and let {^j} be an i.i.d. sequence of uniform(0,l) random variables that 
are also independent of Y. Then let X satisfy 



(14) 



Ro{t) = Y(^J^ Xo{X{s))d6 

X{t) = X{0) + \Ck / l(9fc_i(X(s-)),(7fc(X(s-)](CRo{s-))c^-Ro(s)- 
k 

The counting process Rq is determining the jump times, which are seen to be exponential random variables 
with parameter Xq{X{s—)). The uniform random variables are then used to select which reaction occurs, 
with the fcth reaction being chosen with probability Xk{X{s—))/XQ{X{s—)). Simulation of ( 14l is called 
Gillespie's algorithm (or simply: simulating the embedded discrete time Markov chain). The standard com- 
mon random number + Gillespie algorithm finite difference method, which is probably the most common 
coupling method used today in the context of finite differences, then consists of using the same Y and choice 
of {^j} for the construction of X^ and X^^"^, and will decouple for the same reasons as that of ([8]). This is 
demonstrated numerically in an example in Section]?] 

Naive couplings: a cautionary tale 

At this point it may be tempting to try to couple the processes generated via Q even tighter by using the 
same Poisson processes for each of 1^ 2 and 3. This would, in effect, be a highbred version of the 
Common Reaction Path and Coupled Finite Difference methods. That is, one may be tempted to use 

=X^+^(0) + ^n,i (^l\l+^iX'^%s))AXi{X%s))ds^ a 

+ E ^^.2 Xt'^iX'+^s)) - A^^(X^+^(.)) A Xi{X\s))ds^ Ck 

\ . (15) 

= X^(0) + ^n,i ^ Xl^^{X'^^{s))AXi{X%s))ds^ Ck 

+ E ^^.2 Xl{X'{s)) - Xl+^{X'+^{s)) A Xl{X'^is))ds^ Ck, 

where we are now using the same Poisson process for all of the auxiliary processes. In fact, this does 
not work: the marginal distributions of {X^^'^ , X^) as generated by (JTSj are not the same as the original 
processes and so this coupling should not be used. In fact, the marginal distributions can be so different 
that the coupled processes will converge to the wrong value as e — )■ 0. This fact is best demonstrated by an 
example. 

Example 2. Consider the system arising from the single reaction 
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with Xq = 1. The stochastic equation of the form ([T]l governing this system is 



Hence, the process can only take the values of one and zero and EX^(l) = exp{— which implies 

-^EX^(l) = -exp{-e}. In particular, 

du 



For the coupling ( [151 ) for this model we have 

which converges to zero as e — )• 0. Perhaps the simplest way to compute the above expectation is to find 
the condition on the first jump times of the underlying Poisson processes that guarantee the random variable 
X^+'^Z^ — X^"*^/^ takes a value of negative one. This event has a probability of O(e^), implying the result. 
This computation is left to the interested reader. □ 

3.2 Analytical results 

The following theorem is the main analytical result of this paper and allows us to conclude that for any 
function / satisfying the assumptions of Theorem [T] and (X^^*^, X^) satisfying (|7]) 

Var [f{X'^\t)) - f{X\t))) < Ctj,Me, 

for some Ctj^M > depending upon t, f, and M (the number of reactions). 

Theorem 1. Suppose {X^^"", X^) satisfy Q with our running Assumptions^and^ Let f : ^ M.be 
a function with bounded first derivative on all x £ S. Then, for any T > there is a Ctj,m > Ofor 
which 

Esup (fiX'^^t)) - f{X'it))y < CTj,Me. 

t<T ^ ' 

We provide two Lemmas, giving the and I? bound on the difference between X^^'^ and X^ , before 
proving Theorem [T] 

Lemma 2. Suppose (X^+'^j X^) satisfy ([7]) with our running Assumptions [l] and [2] Then, for T > there 
is a Ct m > for which 



Esup 

t<T 



X^+'{t)-X%t) 

t<T 

Proof Let T > 0. For any s > 

X^+'{s) - = M^''{s) + F^+'{X^+'{r)) - F%X\r))dr, (16) 

where M^'*^ is a martingale with quadratic covariation 
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where 



Therefore, for s < t 



\X^+'{s) - X\s)\ < sup \M^''{r)\ + r |F^+^(X^+^(r)) - F\X^+' {r))\dr 

r<t Jo 

+ r \F\X^+\r)) - F\x\r))\dr 
Jo 

< sup |M^''(r)| + K2€s + Ki [ |X^+'(r) - X'^{r)\dr 

r<t Jo 

< sup \M^'%r)\ + Kaet + i^i /" sup \X^+'{u) - X^{u)\dr, 

r<t Jo u<r 

where Ki , K2 are the constants of Assumption [T] and [2| respectively. As the above inequaUty holds for all 
s < i, we have 

sup - X^{s)\ < sup \M^''{r)\ + K2et + Ki [ sup \X^^'{u) - X^{u)\dr. (17) 

s<t r<t Jo u<r 

By the Burkholder-Davis-Gundy inequality and the fact that ^/z < z for all nonnegative integers, we have 
the existence of a C2 > for which 

Esup|M^'^(r)| < CaV [\\Xl+'{X^+'{u)) - Xi{X\u))\du 

r<t ^ Jo 

<C2Y,\ f nxl^\X'+\u)) - \i{X'+^{u))\du+ f n\l{X<^+%u)) - \l{X\u))\du 

< Cset + C4 / ¥.\X^+^{u) - X\u)\du 
Jo 

<C3€t + C4 [ E sup IX'^^' (u) - X%u)\dr, 

Jo u<r 

where C3 and C4 are constants independent of e or T, and depend linearly on M (the number of reactions). 
Taking expectations of ( fTTj ), applying ( [T8| ), and using Gronwall's inequality gives the desired result. 

□ 

Lemma 3. Suppose {X^^"^, X^) satisfy Q with our running Assumptions [l] and [2] Then, for T > there 
is a Cr M > for which 



E sup 

t<T 



X^+'{t)-X\t) 



< Ct,m(-- 



Proof. Returning to (16 ) in the proof of Lemma [2| we have that 

|X^+^(s) - X\s)\'^ < 2|M^'^(s)|2 + 2T r \F^+'{X^+'{r)) - F'^{X^{r))\'^dr. 

Jo 



The proof is now essentially the same as that of Lemma [2j though ( l^i in combination with Lemma [2] is 
used to bound the martingale term. □ 
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Note that the expected difference of the pth moment, for any p > 1, can be estimated in the same manner 
as above. 

Example 3. To demonstrate the above Lemmas, we consider the following simple example: 

9 As, 

where the parameter of interest is the rate constant 6. For this example we have that A^(x) = 9, and so 

X'+^{t) = Y,{dt) + Y2{et) 
X\t) = Yi{9t). 

Hence X^^'^{t) — X^{t) = Y2{et), and the statements of Lemmas |2] and [s] follow immediately. Further, the 
lemmas are shown to be sharp. 

Proof, (of Theorem [T]) By Taylor's theorem combined with our assumption on /, we have that for some 

Cf>0 

- fix'it)))' < Cf\X'+Ht) - x'it)\^ 

and the result follows by application of Lemma [3] □ 

We return now to ([5]) and ([6]) and note that with X^+'^Z^, X^^*^/^ generated via the coupling proposed 
here, 

Var(d[i](e)) = 0(e-i) 
Var(l)fi(e)) = 0(i?-ie-i), 

which are an order of magnitude lower, in terms of e, than the respective variances when the processes 
are generated independently. As discussed at the end of Section |2. 1[ this fact leads to a decrease in the 
computational work (and simulation time) required to solve a given problem to a desired tolerance by a 
factor of e, yielding potentially enormous savings. 



4 Numerical examples 

We compare our method with existing methods on three different models: a basic model for the production 
of mRNA and proteins, an M/M/oo queue, and a genetic toggle switch from |18|. Because the common 
reaction path method of [18] tends to perform at least as well as the usual implementation of common 
random numbers with Gillespie's algorithm, we choose to only include the Common Reaction Path method 
in our comparison (except for one plot in Numerical Example 2 to demonstrate the decoupling alluded to at 



the end of Section 3.1 



Numerical Example 1. Consider the model of gene transcription and translation 

gAg + M, M^M + P, mA0, pA0, (19) 

where a single gene is being translated into mRNA, which is then being transcribed into proteins. The final 
two reactions represent degradation of the mRNA and protein molecules, respectively. Assuming that there 
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is a single gene copy, the stochastic equation ([T]l for this model is 



X\t)=X^{0) + Yi{2t) 
ft 



X^{s)ds 







+ Y2 


-1 



10Xf{s)ds 



+ Y3 



exi{s)ds 



(20) 



where Xi{t) and give the number of mRNA and protein molecules at time t, respectively, and Yi, ^2 

are independent unit-rate Poisson processes. Suppose the rate constant 6 is of interest to us and we believe 
that 9 1/4. We would like to estimate the sensitivity of the mean number of protein molecules at time 
T = 30, say, with respect to the parameter 1/4. Here, it is a straightforward calculation to find that 



EX|(30) 



6»=l/4 



79.941 and -^EX^(30) 
do 



-318.073, 



0=1/4 



if the initial condition is X^iQ) = [0, 0]^. Defining 



.y;(3o) , 

our goal is to efficiently estimate J(l/4) and we compare the following methods on this problem: 



{i) the usual crude Monte Carlo (CMC) estimator with independent samples, 
[ii) the common reaction path (CRP) method of fTSl using the coupled processes ([8]l, 
{iii) the coupled finite difference (CFD) method being proposed in this paper using the coupling (|7]l, 
{iv) a Girsanov transformation method of Plyasunov and Arkin detailed in ifTTl . 

For all simulations, we assume an initial condition of zero mRNA and zero protein molecules. We will 
denote the number of sample paths used in the construction of the relevant estimators (|5]) via R and the 
perturbation in the centered finite difference via e. 

In Table [1} we provide the 95% confidence intervals computed using the crude Monte Carlo method 
with independent paths (CMC), the common reaction path method (CRP), and the coupled finite difference 
method (CFD) for different choices of R, the number of paths simulated, and e, the perturbation of 9. For 
each method and choice of R, we also provide: (i) an approximate total number of steps (and random num- 
bers used) over the course of the entire simulation, and {ii) an approximation of the CPU time required. 
These numbers, which quantify the computational work required from each method, are essentially inde- 
pendent of e, and the numbers provided are the average of the actual values for the two different values of 
e for a given method and choice of R. These numbers should be used as a reference for the computational 
complexity required by the different methods with the understanding that CPU time will depend greatly 
upon implementation (the author used Matlab for all computations, which were performed on an Apple ma- 
chine with a 2.2 GHz Intel 17 processor). In Table[2j we provide similar data for the Girsanov transformation 
method of [17]. 

While Tables [T] and [2] demonstrate that the method being proposed here can produce a more accurate 
estimate in less CPU time than the other methods, a more important statistic is the CPU time needed for 
each method to achieve a desired tolerance. Therefore, we applied each method until the 95% confidence 
interval was it 6.0. The finite difference methods were applied with a perturbation size of e = 1/40. Table 
[3] provides the number of updates needed by each method combined with the CPU time needed on our 
machine. We see that the coupled finite difference (CFP) method was approximately 9 times more efficient 
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Method 


R 


e = 1/20 


e = 1/100 


# updates 


CPU time 


CMC 


1,000 


-276.2 ± 46.3 


-472.7.1 ± 237.3 


8.4 X 10^ 


W9.6S 


CRP 


1,000 


-323.1 ± 18.4 


-321.0 ±60.2 


8.4 X 10^ 


« 10.1 S 


CFD 


1,000 


-323.7 ± 8.7 


-333.8 ± 28.0 


4.4 X 10^ 


«6.5S 


CMC 


10,000 


-324.8 ± 14.7 


-305.4 ± 74.3 


w 8.4 X 10'^ 


«98.8S 


CRP 


10,000 


-325.5 ± 5.8 


-328.6 ± 18.6 


8.4 X 10'^ 


w 105.4 S 


CFD 


10,000 


-320.0 ± 2.8 


-316.6 ± 8.9 


« 4.4 X 10' 


« 64.9 S 


CMC 


40,000 


-322.7 ± 7.5 


-341.9 ± 37.3 


w 3.4 X 10^ 


395.3 S 


CRP 


40,000 


-319.6 ± 2.9 


-310.6 ± 9.3 


w 3.4 X 10^ 


RJ411.5S 


CFD 


40,000 


-321.6 ± 1.4 


317.8 ±4.4 


^ 1.8 X 10^ 


« 263.3 S 



Table 1: 95% confidence intervals and computational complexity for (i) crude Monte Carlo (CMC), {ii) the 
common reaction path (CRP) method of [18], and {Hi) the coupled finite difference method (CFD) proposed 
here, applied to ([19]) in order to approximate J(l/4) for different choices of R and e. The exact value is 
J(l/4) = —318.073. Note that the bias of the centered finite difference is apparent when e = 1/20 and 
R = 40,000. 



R 


Approximation 


# updates 


CPU Time 


1,000 


-441.5 ± 156.5 


4.2 X 10^ 


5.2 S 


10,000 


-324.2 ± 49.2 


4.2 X lO'^ 


54.5 S 


40,000 


-327.8 ±25.1 


1.7 X IQS 


207.6 S 



Table 2: 95% confidence intervals and computational complexity for the Girsanov transformation method 
of 111 71 applied to ([19]) in order to approximate J(l/4). The exact value is J(l/4) = —318.073. 



Method 


R 


Approximation 


# updates 


CPU Time 


Girsanov 


689,600 


-312.1 ± 6.0 


2.9 X 10*^ 


3,506.6 S 


CMC 


246,000 


-319.3 ± 6.0 


2.1 X 10*^ 


2,364.8 S 


CRP 


25,980 


-316.7 ± 6.0 


2.2 X 10*^ 


270.9 S 


CFD 


4,580 


-319.9 ± 6.0 


2.0 X 10'^ 


29.2 S 



Table 3: Required R, # updates, and CPU time for each method to provide a 95% confidence of ± 6.0. Each 
finite difference method used e = 1/40. The exact value is J(l/4) = —318.073. 
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that the common reaction path method, and vastly more efficient than both the Girsanov transformation 
method and the crude Monte Carlo with independent samples. 

Next, we simulated the system pO] ) 5,000 times using each of the different methods and plotted the vari- 
ance of the estimators versus time up to T = 60, see Figure[T] The finite difference methods were computed 
with a perturbation of size e = 1/40. We note that the variance of each of the finite difference methods 
appears to converge, though the limiting value for the coupled finite difference method being proposed here 
converges to a value that is approximately 6.5 times lower than that of the common reaction path method, 
and 52 times lower than crude Monte Carlo. Also note that the variance of the Girsanov transform method 
grows Unearly in time, as expected, and is quite large for even moderate values of time, t. 

□ 

Numerical Example 2. We revisit Example [1} which modeled an mRNA molecule being created and de- 
graded (or an M /M / oo queue), 

e 

^ M. (21) 

0.1 

We suppose that we want to understand the sensitivity of the expected number of particles (or customers, 
in the queuing setting) with respect to the parameter 9 ^ 2. In Figure [2] we provide a plot of the variances 
of the different estimators as functions of time. So as to demonstrate the different behaviors of the different 
estimators, the scales on both the time and variance axes are dramatically different for the different methods. 
For each of the methods, we chose R = 1,000, and used e = 1/100 for the perturbation methods. Recall 
that in Example [l] we proved that the variance of the difference between X^^"^ and will converge to 
e/0.1 if they are coupled using (|7]). Therefore, the variance of the estimator (|5]) will converge to 

e 1 1 _ 1 1 
(U~e^R ~ ROTe' 

as t — )• oo. In our case, e = 1/100 and R = 1,000, and the above value is equal to one. This predicted 
behavior is born out in Figure (2el. Also in Example [TJ we predicted (though did not prove) that after a 



long enough time the variance of both the common reaction path estimator (CRP) and Gillespie's algorithm 
plus common random numbers (CRN) should converge to the variance of the crude Monte Carlo estimator 
constructed with independent paths. In essence, we are predicting that the processes will decouple after a 
long enough time and behave independently. This behavior is demonstrated in Figures ( |2a| and ( [2c| ) (for 
CRP) and ( |2f| ) (for Gillespie -i- CRN), though we note that the time for a full decoupling is quite large in 
this example. Also note that we plotted the variance of the estimator for the common reaction path method 
both up to time T = 100 and T = 10,000 so as to demonstrate the different behaviors exhibited. Finally, 
we point out that the full "decoupling" of the CRP method described here does not seem to take place in 
Example 1. The estimator built using the Girsanov transformation method exhibits a variance that grows 
linearly in time. 

Next, we considered the sensitivity to the decay parameter at 0.1. That is we considered 

iai±M, (22) 
e 

with 6 = 0.1. In Figure |3] we provide a plot of the variance of the Coupled Finite Difference estimator 
versus the Common Reaction Path estimator. Each plot was generated using 1 ,000 sample paths in which 
a perturbation of e = 1/100 was used. We again see the lower variance exhibited by the Coupled Finite 
Difference Estimator, though the difference is now less dramatic. □ 

Numerical Example 3. We consider a model for a genetic toggle switch found in ifTSll : 

Xi{X) A3(X) 

^ Xi, ^ X2, (23) 

A2(X) A4(X) 
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500 




10 20 30 40 50 60 
Time 

(c) Gil'sanov Transformation 



Figure 1: Variance of the different estimators applied to ( |20| ). For each, R = 5,000 sample paths were 
used to construct the relevant estimators. For each of the finite difference methods (figures ( [Ta| and (fTb])), a 
perturbation of e = 1/40 was used. Note that the scales on the variance axis are dramatically different for 
the different methods. 
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40 60 80 100 
Time 



20 

(a) Crude Monte Carlo 




450 



20 40 60 80 100 
Time 

(b) Girsanov Transformation 




2000 4000 6000 8000 10000 
Time 



(c) Common Reaction Path, T 
10,000 




40 60 80 100 
Time 

(d) Common Reaction Path, T = 
100 




20 40 60 80 100 
Time 

(e) Coupled Finite Differences 




2000 4000 6000 8000 10000 
Time 

(f) Gillespie + Common Random Num- 
bers 



Figure 2: Variance of the different estimators applied to ( |2T] ). R = 1,000 sample paths were used to construct 
the relevant estimators. For each of the finite difference methods (plots ( |2a| , ( |2c| ), pd] ), (2ei, and ([2f|)), a 
perturbation of e = 1/ 100 was used. So as to demonstrate the different behaviors of the different estimators, 
the scales on both the time and variance axes are dramatically different for the different methods. Also, note 
that we plotted the variance of the estimator for the common reaction path method both up to time T = 100 
and T = 10,000 so as to demonstrate the different behaviors exhibited. Note that the CRP and Gillespie + 
Common Random Number methods appear equivalent for this example. 
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Figure 3: Time plot of the variance of the Coupled Finite Difference estimator versus the Common Reaction 



Path estimator for the model (22i with decay rate perturbed. Each plot was generated using 1,000 sample 



paths. A perturbation of e = 1/100 was used, 
with intensity functions 



and parameter choice 



ai = 50, a2 = 16, /3 = 2.5, 7 = 1. 



We begin the process with initial condition [0, 0] and consider the sensitivity of Xi as a function of ai. In 
Figure |4] we provide a plot of the variance of the Coupled Finite Difference estimator versus the Common 
Reaction Path estimator as a function of time. Each plot was generated using 10,000 sample paths in which a 
perturbation of e = 1/10 was used. We see that the Coupled Finite Difference method performs substantially 
better for times t > 5, whereas the Common Reaction Path method performs better for shorter times, t < 5. 
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Figure 4: Time plot of the variance of the Coupled Finite Difference estimator versus the Common Reaction 
Path estimator for the model (23l. Each plot was generated using 10,000 sample paths. A perturbation 
of e = 1/10 was used. It is worth noting that here the Common Reaction Path estimator outperforms the 
Coupled Finite Difference estimator for times t < 5, though CFD still greatly outperforms CRP for larger 
times. 
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